home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Software Vault: The Gold Collection
/
Software Vault - The Gold Collection (American Databankers) (1993).ISO
/
cdr10
/
snr503.zip
/
SNR503.DOC
< prev
next >
Wrap
Text File
|
1993-05-30
|
68KB
|
1,799 lines
TITLE: SNR.EXE VERSION: 5.03
PURPOSE: Multi-string simultaneous Search 'N' Replace program
RELEASE DATE: 5/31/93
AUTHOR: THOMAS A. LUNDIN
16267 Hudson Avenue
Lakeville, MN 55044
day phone: (612) 588-7571
nights/weekends: (612) 431-5805
CompuServe ID [70523,262]
REGISTRATION: $50 per copy; volume and site licenses available.
Registered version discards the opening shareware
compliance screen, and a free multi-window DOS text
editor is bundled with the package.
Registration guarantees technical support; unregistered
users will receive support on a time-available basis.
Fill out and return the ORDER.FRM file along with your
payment to register your copy.
DESCRIPTION: SNR is a multi-string search-and-replace filter. Both
text and binary files can be processed by the program.
SNR will translate a file of any size that your system
can handle.
Version 5 of SNR brings many new advanced conversion
capabilities to the program, such as:
- wild-card conversion patterns
- variable-length conversion patterns
- start-of-file and end-of-file conversions
- text field padding
- table chaining
- enhanced context flags
Up to 2,500 multi-character (m:n) equations can be
entered into an SNR table, each of which can be a
maximum of 4,999 characters in length. Total conversion
space is limited to 64,000 characters. All search-and-
replace operations are performed in a single pass.
Context flag conversion allows toggling between two
different output strings for the same input string.
Wild card patterns allow unspecific character strings
to be matched and converted.
-1- SNR+ VER 5.03
REVISION HISTORY
5.02 - 5/26/93
Major rewrite. Added advanced conversion features such as wild
card pattern matching, variable-length patterns, field padding,
and so on. Allow file names to contain DOS wild cards, also.
Allow multiple-table chaining on the command line. Allow output
file subdirectory specification. Greatly expanded program
capacities. Allow 16 context flags. Change default bit-masking
from 7-level to 8-level as this is the more harmless option.
4.0 - 7/19/90
Removed OS/2 compatibility. Nobody used it. Allow 256 1:1
equations to be loaded in addition to 50 m:n equations. Added
context flag.
3.01 - 11/3/89
Program recompiled for family mode, meaning it will run as-is
under DOS, OS/2 real mode, and OS/2 protected mode.
Also improved I/O throughput on fast disk drives.
3.0 - 9/27/89
The program has been completely rewritten. The pesky bugs that
were present in versions 1.5 and 2.0 have been eliminated.
All binary characters INCLUDING NULLS can now be searched and
replaced. Nulls can also be entered as part of a larger search-
and-replace equation.
An automatic "bit-stripping" feature has been added which removes
the high-order bits from input characters, forcing 8-bit codes
into 7-bit ASCII. The feature can be selectively disabled for
binary files.
The equation length has been increased to 200 characters, which
can be split into search and replace sides of any length, as long
as the total does not exceed 200.
Equations no longer need to be in any specific order.
The average speed of the program has been vastly improved,
especially for multiple and high-occurrence search and replaces.
The program no longer needs to read itself for runtime
parameters.
-2- SNR+ VER 5.03
PROGRAM OPERATION
The command line invocation is:
snr [@]filename ext table[ table2...] [/d] [/opathname] [/r]
1. "filename" is the name of the DOS file you wish to convert. The
DOS wildcard characters "*" and "?" are allowed here, and they
will result in all files matching the name pattern to be
converted at one time. An at-sign (@) in front of the filename
indicates that the file contains a list of file names to convert.
There are two ways of creating the list of file names: the first
is to use the DOS DIR command and redirect its output to a file,
like this:
dir *.txt >dirfile
This will create a disk file named "dirfile" which contains a
directory list of all files with a ".TXT" extension. You can edit
this file if you want, to weed out certain files you don't want
and retain the others.
You can also create a list file manually by just creating a file
with a text editor and typing in the path and file names for each
of the files you wish converted. Example:
d:\data\test.txt
c:\bigfiles\tapedat.asc
c:\windows\win.ini
The converted output files will appear in your current directory.
You can optionally direct the output to another directory with
the /o option, explained later.
2. "ext" is the filename extension you wish to assign each output
file. The output files are created by adding this extension onto
the name of the input file. If you are converting multiple input
files from a DIRLIST, or from a wild card in the command line,
all of the resulting output files will have the same extension.
The choice of an extension is important; if you accidentally
choose an extension which is already used by another file, you
will overwrite the existing file, losing its data in the process.
SNR will not warn you of impending overwrites. Use an extension
which you are sure is unique.
Extensions can be the DOS device names CON, NUL, AUX, and PRN. In
any of these four cases, an output file is NOT created on disk;
rather, the output is redirected to the console (CON), nowhere
(NUL), the rs232 port (AUX), or the printer (PRN). Using CON is
handy for a quick preview of the conversion process before
storing it to disk, although if you preview binary files to CON,
be aware that hex code 1A (DOS end-of-file) will terminate the
conversion display, perhaps prematurely. (This permature
termination will NOT occur when you store to disk -- it is an
artifact of displaying to the console.) Also, some binary
characters will set the PC speaker beeping when you display to
the screen.
-3- SNR+ VER 5.03
3. "tablename" is the file name and optional path which contains
your string translation "equations". Although no restrictions are
placed upon the tablename (aside from conforming to the DOS
naming format), for sake of clarity it is suggested that you
adopt a consistent naming scheme for them, (say, with an *.S
extension). Creating conversion equations is discussed in more
detail later. You may specify more than one table at a time on
the command line; the effect of this is that SNR will convert
your input files through each table in sequence. The contents of
the output files will be just as if you had run SNR a number of
times in a row.
4. "/d" is an option which replaces the original input file with the
converted data. The extension which you specify on the command
line is used as a temporary intermediate file until the
conversion is complete, at which time the program deletes the
original input file and renames the temporary output file to the
input name.
5. "/o" is an option which allows you to place the output files in
some directory other than the current directory. The /o is
followed immediately by the name of the directory path (no spaces
in between).
6. "/r" is an option used only in conjuction with the /o option; it
is meaningless by itself. /r forces all of the output files in
the other directory to retain the same name as the original input
files. At the end of the conversion run using this option, you
will have two directories of files with the same names, but the
data in the files in one of the directories will have been
converted.
7. The options /o, /r, and /d may appear in a list file, after the
name of the file to convert. Example:
d:\data\file.1 /od:\
d:\data\file.2
c:\text\may.doc
In the above example, all of the files in the list will be
converted and stored on the root directory of drive D:. These
option codes may NOT appear in a list file created by redirecting
the DOS DIR command to a file.
-4- SNR+ VER 5.03
COMMAND LINE EXAMPLES
C>snr tstfil.doc txt tst.s
The above command line will convert input file "tstfil.doc"
through table "tst.s" and create output file "tstfil.txt".
C>snr @dirlist p1 sample.s
The above command line will convert a group of files listed in
the file "dirlist" through table "sample.s" and create a group of
output files with extensions of *.p1.
C>snr *.c xxx pass1.s pass2.s pass3.s /d
The above command line will convert a group of files matching the
DOS file name pattern "*.c" through the three tables "pass1.s",
"pass2.s" and "pass3.s" in order, and the converted output files
will replace the original input files. (The temporary
intermediate files will have an extension of *.xxx.)
C>snr mytest.dat da2 pass1.s /od:\
The above command will convert the input file "mytest.dat"
through table "pass1.s" and create an outfile named "mytest.da2"
in the root directory of drive D:.
-5- SNR+ VER 5.03
CREATING CONVERSION TABLES
SNR tables are ASCII text files which contain the search-and-replace
equations used by the program. Any word processor or text editing
program capable of loading and saving plain ASCII text files will be
sufficient to create conversion tables. (The registered version of SNR
is bundled with a free multi-window DOS text editor for this purpose.)
Up to 2,500 of these equations can be entered in a single table, and
each equation can consist of up to 4,999 characters split freely
between the search side and the replacement side. Blank lines in a
table will be ignored. The maximum size of a table is limited to
64,000 characters. These equations will be matched against the input
file in a single pass, resulting in a converted output file.
EQUATION TYPES
This manual may refer to certain equation types from time to time. The
equation types that are handled by SNR are as follows:
1:0 an equation comprising one character on the left side and no
characters on the right side
1:1 an equation with one character on the left side and one character
on the right side
1:n an equation with on character on the left side and two or more
characters on the right side
m:0 an equation with two or more characters on the left side, and no
characters on the right side
m:1 an equation with two or more characters on the left side, and one
character on the right side
m:n an equation with two or more characters on the left side, and two
or more characters on the right side
FORMAT OF EQUATIONS
The simplest form of conversion equation consists of literal search
and replace text separated by an equals sign. A sample 1:1 equation
would be:
A=a
The above equation would translate an upper-case 'A' to a lower-case
'a'.
A sample m:n equation would be:
Now is the time=NOW IS THE TIME
The above equation will translate the words "Now is the time" to all
upper-case. Notice that spaces ARE SIGNIFICANT characters in an
equation, so don't use them carelessly if you don't mean to have them
converted.
1:0 and m:0 equations ignore search strings (that is, throw it away on
output). They are defined simply by leaving out a replacement string,
like this:
Now is the time=
-6- SNR+ VER 5.03
If you want to output word spaces at the end of a replacement string,
but your text editor strips trailing spaces, you can define them as
hex codes:
Now is the time =NOW IS THE TIME\20
In fact, any hex code can be formed from a backslash followed by two
hex digits (0-9, a-f, A-F). You'd normally use hex codes to search or
replace binary characters that can't be generated directly from the
keyboard. For instance, a carriage return/line feed sequence (CRLF)
can be specified like this:
\0d\0a\0d\0a=\0d\0a
The above equation will convert two CRLFs in a row to a single CRLF.
SNR RESERVED CHARACTERS
There are three ASCII characters which MUST be specified as hex codes
in an SNR equation, since they have special meaning in their normal
ASCII form. They are:
Backslash (\), which must be entered as \5c
Equals (=), which must be entered as \3d
Asterisk (*), which must be entered as \2a
If you forget to follow this notation, you will very likely trigger
error messages when you attempt to run a conversion.
EXTRA-LONG EQUATIONS
SNR can handle equation lengths up to 4,999 characters. It's quite
likely that you will never encounter this limit. Even so, your word
processor or text editor may have a much smaller limit on the number
of characters you can enter per line. Should you find you need to
carry an equation over from one line to the next, you can end a line
with \+ as the last item in the line, and SNR will automatically add
the next following line to the same equation. You can have as many
continuation lines as you need to. Example:
Now is the time for all good men=NOW IS THE TIME \+
\20FOR ALL GOOD MEN \ This is all one continued equation
Be mindful that any spaces that occur before the \+ continuation code
are ignored (an exception to the all-spaces-are-significant rule). Any
text that appears on the same line after the \+ code is ignored.
ENDING YOUR TABLES
The end of a table is signified by the end of your equations. You may
also place a \\E on a line by itself as an end-of-table marker. This
code is optional, but recommended, since it will prevent SNR from
inadvertently reading past the intended last line of your equations
(some word processors may pad their last blocks with garbage or with
an EOF character, which SNR would attempt to read as the start of a
new equation). A sure bet that you need to place the \\E code at the
end of your tables is if SNR occasionally aborts with the message:
Incomplete equation (missing right half):
-7- SNR+ VER 5.03
...and you are unable to see that any of your equations is incomplete.
REPEATING CHARACTERS
SNR provides a shorthand notation for describing a large number of
repeating characters. Rather than entering an equation that looks like
this:
AAAAAAAAAAAAAAAAAAAAAAAAA=bbbbbbbbbbbbbbbbbbbb
...you can describe it like this:
*(25)A=*(20)b
Any character can be repeated using this notation. By changing the
repeat value in parentheses, you accomplish the same thing as typing
the repeat character that number of times. NOTE: SNR will actually
expand a shorthand notation into longhand notation before it begins
conversion, so make sure you don't inadvertently exceed the 4,999-
character equation limit with your repeat values.
COMMENTING YOUR EQUATIONS
SNR provides for writing comments in your tables. Comments can be
entered in a table as lines by themselves, or set off to the right of
an equation, as in this example:
\ This is a comment line by itself.
\ A comment consists of a single backslash
\ followed by one or more spaces.
\0d\0a=\0d\0a \ this will ensure that existing CRLF pairs are
\ left untouched
\0d=\0d\0a \ this equation will convert an isolated CR
\ into a CRLF
\0a=\0d\0a \ this equation will convert an isolated LF
\ into a CRLF
\\E
When a comment follows an equation on the same line, any spaces that
occur before the comment are ignored; this is an exception to the all-
spaces-are-significant rule.
If you're a programmer, or you've used a macro language of some kind,
you already know the value of program comments. If you're new to the
field of user programming, you should immediately get into the habit
of commenting your work. Believe me, you'll be glad you did, because a
time will come when you have to make a change to a table you created
months past, and you'll be left scratching your head over it if you
haven't commented the equations.
EQUATION ORDERING
SNR will automatically sort equations by length when assembling a
table in memory. SNR does not process equations in sequential order
when it converts a file, so the order in which you enter your
equations is largely immaterial. The exceptions to this rule are when
using wild card patterns in your equations. This will be explained
further on.
-8- SNR+ VER 5.03
Equation processing is completed in a non-recursive fashion. That is,
once an equation has been matched and a replacement made, the replaced
data is not sent back for reconversion through another equation. It is
passed onto the output file once.
-9- SNR+ VER 5.03
CONTEXT FLAGS
There are probably instances where you'd like to toggle between two
different replacement strings for the same search string. The context
flag allows you to do this. Some example uses would be:
- to compress multiple spaces from a document
- to ignore any text between two codes
- to make one code alternate as two different codes
- to turn all-upper-case text into upper and lower
There are undoubtedly many other uses.
There are 16 context flags which can be used, and each context flag
has two states: on and off. The context flags are identified by the
numbers 0-9 and the letters a-f (for a total of 16 individual flags).
When SNR begins execution, the context flags' states are all OFF. A
flag's state can be tested in a search, and its state can be set or
reset in a replacement. A context flag is represented in an equation
by an asterisk, followed by the number or letter of the flag, followed
by a one or zero (for ON or OFF).
A context flag is the last item entered in a search or replacement
string. For example:
ABC*00=abc*01
ABC*01=XYZ*00
In the above two equations, if the string "ABC" is read from the input
file, and context flag 0 is OFF, then the string "abc" is written to
the output file, and context flag 0 is set ON. If, on the other hand,
the string "ABC" is read from the input file, and context flag 0 is
ON, then the string "XYZ" is written to the output file, and context
flag 0 is reset OFF. For example, if data from the input file looks
like this:
ABCDEFG ABCDEFG ABCDEFG ABCDEFG
...our small example table would convert it to this:
abcDEFG XYZDEFG abcDEFG XYZDEFG
Context flags are global in scope; this means that each flag is
accessible by many different equations in one table, and any equation
can test or set the flags. But beware of unwanted side effects --
remember that only one equation at a time will control the state of a
flag, and its state may change between two equations that complement
each other, resulting in misconversion. For example:
ABC*00=abc*01
ABC*01=XYZ*00
EFG*00=efg*01
EFG*01=ZYX*00
-10- SNR+ VER 5.03
The above equations are similar to the previous example, except that
we have defined two new equations that test and set a single context
flag. So, given this example input data:
ABCDEFG ABCDEFG ABCDEFG ABCDEFG
...the result would convert to this:
abcDZYX abcDZYX abcDZYX abcDZYX
The context flag is being toggled by only two equations:
ABC*00=abc*01 <--- this one
ABC*01=XYZ*00
EFG*00=efg*01
EFG*01=ZYX*00 <--- and this one
If, on the other hand, the input data stream looked like this:
ABCDEFG DEFGABC ABCDEFG DEFGABC
...the result would look like this:
abcDZYX DefgXYZ abcDZYX DefgXYZ
This conflict between two sets of equations grappling over the same
context flag can be alleviated by using a different flag number for
one of the sets:
ABC*00=abc*01
ABC*01=XYZ*00
EFG*10=efg*11
EFG*11=ZYX*10
Now the equations will have independent flags, and the data stream
will be processed using each equation separately. Now, this data
stream:
ABCDEFG ABCDEFG ABCDEFG ABCDEFG
...will be converted to this:
abcDefg XYZDZYX abcDefg XYZDZYX
The thing to keep in mind if you are going to use a context flag is:
it is OFF when the program begins; you need to enter an equation to
set it ON; and then you need to have some equation that resets it OFF.
See some of the sample tables for examples in the use of context
flags.
-11- SNR+ VER 5.03
SPECIAL CONTEXT FLAG CODES
Certain types of conversions involving context flags can be very
tedious to define. Take the case of a table whose task is to compress
two or more adjacent blank spaces into a single space. Using our
context flags, we would start out by setting a flag when we first
encounter a space:
\20*00=\20*01
This equation states that if a space is encountered and flag 0 is OFF,
it will be converted to one space and flag 0 ON. Next we define what
happens to a space when flag 0 is ON:
\20*01=
This equation states that a space and flag 0 ON, will be ignored. That
takes care of removing adjacent spaces, but there's something missing.
The way we have it set up now, EVERY subsequent space will be removed,
even between words, because we have not defined the case where flag 0
ever gets set OFF. Without something to set flag 0 OFF, only the
second equation will hold true.
Since we have defined the case where adjacent spaces will be ignored,
we need to define the case where some character other than a space
will reset flag 0 so we can retain the single interword space
mentioned earlier.
Here's where the tedium comes in. The complete solution is to make
every character EXCEPT a space set flag 0 OFF, like this:
\00=\00*00
\01=\01*00
\02=\02*00
\03=\03*00
\04=\04*00
\05=\05*00
...
\7c=\00*00
\7d=\00*00
\7e=\00*00
...
...and so on. Basically, you'd need 255 similar equations to finish
the task.
Specifically for these types of conversions, SNR provides two
specialty context flag codes: *ic and *ig.
IGNORE CONSECUTIVE
The Ignore Consecutive (*ic) flag code is used on the right-hand side
of the conversion equation. Its purpose is to allow you to specify any
set of characters that will be ignored when encountered adjacently in
a data stream. A side benefit of this notation is that SNR will
automatically generate all of the other flag reset equations that are
needed to complete the conversion. This vastly reduces the "clutter"
-12- SNR+ VER 5.03
that would otherwise be present in your table. For instance, here is
our complete space compression table with the help of the *ic code:
\20=\20*ic\20
That's all there is to it. This equation reads: one space equals one
space, but ignore consecutive subsequent spaces.
You may define more than one character after the *ic code. In fact,
you may define as many characters after the *ic code as you care to.
Each defined character will be ignored if they occur consecutively in
the data stream after matching the string on the left side of the
equation. Here's an example:
.\20.\20.=<ldr>*ic\20.
This equation will convert three periods separated by spaces into the
string <ldr>, and ignore any subsequent consecutive periods and
spaces. Thus, given the following input data:
Price. . . . . . . . . . . . . . . $25.00
...the equation would convert it to this:
Price<ldr>$25.00
IGNORE GENERAL
The other specialty context flag code is the Ignore General (*ig)
code. This code appears on the right-hand side of an equation and its
purpose is to cause all characters occurring after the equation to be
ignored, until an Ignore Cancel (*ix) flag code is encountered.
Example:
[=*ig
]=*ix
These two equations will cause anything falling between a left and
right square bracket to be ignored, including the brackets themselves.
Other characters falling outside of these delimiter characters will be
converted in normal fashion.
The *ig/*ix notation is the quickest way to remove enclosed groups of
characters from a data stream.
-13- SNR+ VER 5.03
START-OF-FILE AND END-OF-FILE CONVERSIONS
SNR provides for special conversions to occur before and after normal
conversion of a data stream. These conversions can be useful in adding
specific headers or trailers to a file, or in removing portions of the
beginning and end of a file.
START-OF-FILE CODE
To trigger an extra conversion at the start of the file, use the \s
start-of-file code. \s can be entered only as the first character on
the left-hand side of an equation. It may be used alone, or be
followed by other characters. Example:
\s=<BEGIN>\0d\0a
The above equation will output the string <BEGIN> as the first item in
a converted file.
\s=*ic\20
The above equation will ignore any group of consecutive spaces
occurring at the beginning of a file.
\sNow is the time=
The above equation will ignore the words "Now is the time" when they
occur at the beginning of a file.
END-OF-FILE CODE
The \q end-of-file code works similarly to the start-of-file code.
When it is used in an equation, it will trigger a conversion at the
very end of a file. \q must always stand by itself on the left-hand
side of an equation; other characters can not follow it. Example:
\q=[END OF JOB]\0d\0a
The above equation will output the string [END OF JOB] at the very end
of a converted file.
There is another use for the end-of-file code: to truncate a file when
a specific character is encountered in the data stream. (This
character may or may not occur at the true end of the file.) This is
very useful in cases where some programs add an end-of-file code to
any files they write, but you don't want them there. For instance, the
DOS COPY command adds a hex 1A to the end of a combined file when you
use the "+" operator to copy multiple files into one.
To use this truncation feature, you first need to define a hex
character that will be recognized specifically as the end-of-file
character. At the beginning of your table, you must place the
following setup code:
\\Qxx
...where "xx" is a two-digit hex code representing the character that
will trigger the end of file. For instance, the hex value 1A is
usually recognized as the DOS end-of-file character, so the setup code
-14- SNR+ VER 5.03
for that case would be \\Q1A. The next step is to add the \q equation
to the table to actually process the end-of-file character:
\q=
This particular example will ignore the end-of-file character when it
is encountered in the data stream, but will terminate processing of
the file at that point, thus truncating the file.
SPECIAL \s and \q INFORMATION
Because of the way SNR assembles and categorizes conversion equations,
any table that contains a \q or \s equation must be of a type that
contains at least one 1:n or m:n equation (this is an equation type
that has one or more characters on the left side of the equation, and
two or more characters on the right side of the equation). \s and \q
equations will not work in a table that contains only 1:0 or 1:1
equation types, because the 1:0 and 1:1 tables use a special high-
speed array translation routine that bypasses the standard equation
processing routines.
A very simple way to coerce the table processor into categorizing a
1:1 table as an m:n type is to add the following equation to it:
\80\80=\80\80
This will have no adverse effect on the data stream, since it converts
two hex 80 characters as two hex 80 characters, but it will have the
desired effect in SNR's table type categorizing.
-15- SNR+ VER 5.03
WILD CARD CODES
Now we get to the fun stuff. While SNR's string conversion
capabilities are quite powerful on their own, there are inevitably
times when it is necessary to perform conversions based on ambiguous
patterns of characters, rather than on literal characters themselves.
An example here would be if you had a comma-delimited ASCII file that
was created by a database program, and you noticed that none of the
phone numbers contained hyphens in them.
It would be quite unfeasible to attempt to literally define all of the
possible phone numbers without hyphens, and convert them into phone
numbers with hyphens. You would end up with about 7 million equations
that went something like this:
2220000=222-0000
2220001=222-0001
2220002=222-0002
...
9989997=998-9997
9989998=998-9998
...hopefully, you get the idea. To help prevent such exercises in
futility, SNR provides a number of wild card codes than can be used in
place of literal characters in conversion equations. The wild card
codes allow you to search and replace on a set of characters rather
than on a specific character. For instance, one of the wild card codes
can be used to search for any of the numeric characters 0-9. Another
can be used to search for any of the alphabetic characters A-Z.
There are 12 wild card codes in all:
\p any character
\n numeric (0-9)
\x full alphabetic (A-Z, a-z)
\u upper case alphabetic (A-Z)
\y lower case alphabetic (a-z)
\m alphanumeric (A-Z, a-z, 0-9)
\t punctuation
\g non-whitespace
\o whitespace (space, tab, return, line feed, vertical tab,
form feed)
\z ASCII-only (c < ASCII 128)
\v printable (c > ASCII 31)
\k non-printable (c < ASCII 32)
The least restrictive of these, \p, will match any character that is
encountered in the data stream. The most restrictive of these, \o,
will match only the set of a few "whitespace" characters. The other
wild card codes fall between these two extremes, and their associated
character sets should be fairly straightforward to understand.
On the search side of an equation, a wild card code can be used as a
straight substitute wherever a literal or hex character would be used.
For instance, if you wanted to delete any pattern of three numbers
contained in parentheses, you would enter the following equation:
-16- SNR+ VER 5.03
(\n\n\n)=
This would delete such occurrences as (612), (000), (123) and so on.
On the replacement side, however, wild card code usage requires some
explanation -- here you can't simply use a wild card code in place of
a literal character, as you can in the search string.
SNR allows you to rearrange the order of wild card characters during
conversion, so we need some method of numbering or otherwise
identifying the position of each wild card code in the search string
so it can be accessed in any order in the replacement string.
This is done easily enough by convention: SNR sequentially numbers the
characters in a search string. The leftmost search character is
position 0, the next one is position 1, the next is position 2, and so
on to the end of the string. The equation:
(\n\n\n)=
...contains five characters: The left parenthesis is character 0, the
wild card codes are characters 1, 2 and 3, and the right parenthesis
is character 4. To use a wild card in a replacement string, then, we
define the position of the character we wish to output, like this:
(\n\n\n)=\p1\p2\p3
The \p code, when used in a replacement string, is always followed by
the position number of the character you wish to output. In the above
example, when three numbers enclosed in parentheses are encountered in
the data stream, the parentheses will be thrown out and only the three
numbers will remain (in the same order in which they were defined in
the search string). If we change the equation to read like this:
(\n\n\n)=\p3\p1\p2
...the three numbers will be output in a different order.
Let's return to the example we opened this section with -- that of
replacing a series of phone numbers to add a hyphen between the third
and fourth digits. Now that we have wild card codes available as a
conversion tool, the task becomes much simpler:
\n\n\n\n\n\n\n=\p0\p1\p2-\p3\p4\p5\p6
Look at the equation: seven numbers on the search side will be
replaced by the first three numbers, followed by a hyphen, followed by
the last four numbers. If the data stream contained 4315805, the
equation would convert it to 431-5805. We could even handle area
codes:
\n\n\n\n\n\n\n\n\n\n=(\p0\p1\p2) \p3\p4\p5-\p6\p7\p8\p9
The equation as written defines 10 numbers on the search side to be
replaced by: a left parenthesis, the first three numbers, a right
parenthesis and a space, the next three numbers, a hyphen, and the
-17- SNR+ VER 5.03
last four numbers. If the data stream contained 6124315805, the
equation would convert it to (612) 431-5805.
As an aid to equation readability, repeated numbers of wild card codes
can use the repeating-character notation described in the first
section:
*(10)\n=(\p0\p1\p2) \p3\p4\p5-\p6\p7\p8\p9
This eliminates the need for you to specifically count how many \n
codes you've entered in a search string. The above equation has
exactly the same meaning as the previous equation. In a replacement
string, you can also use the repeating-character notation, so long as
you identify which position is being initially output:
*(10)\n=(*(3)\p0) *(3)\p3-*(4)\p6
Look at the equation; it defines 10 numeric characters to be replaced
by: a left parenthesis, three numeric characters starting at position
0, a right parenthsis, a space, three numeric characters starting at
position 3, a hyphen, and four numeric characters starting at position
6. This is exactly the same equation as the one defined three
paragraphs ago, but it is shorter and perhaps easier to read.
Which notation you ultimately use -- long or short -- is purely a
matter of personal taste and appropriateness for the equation at hand.
Some equations will be more readable using the "long" notation, while
others will be a natural fit for the "short" version. There is no need
to dogmatically rely upon one or the other.
MIXING WILD CARD CODES
Wild card codes can be intermixed in an equation. For instance, the
following equation will convert a comma, a space, two upper case
letters, and five numbers into: a comma, a space, two upper case
letters, two spaces, and five numbers:
, \u\u\n\n\n\n\n=, \p2\p3 *(5)\p4
So if this string of characters was encountered in the data stream:
, MN55044
...the equation would convert it to:
, MN 55044
CONVERTING SHIFT CASE
Aside from \p, two other wild card codes can be used in a replacement
string. Their purpose is to alter the shift case of an alphabetic
character. The codes are \u and \y, for upper and lower conversion. As
an example, we can make the sample equation dealing with the state and
zip code used earlier a lot more universal with these new replacement
wild cards:
, \x\x\n\n\n\n\n=, \u2\u3 *(5)\p4
-18- SNR+ VER 5.03
This new equation will convert a comma, a space, two alphabetic
characters of undetermined case, and five numbers into: a comma, a
space, two upper case characters, two spaces, and five numbers.
So if the following items were encountered in the data stream:
, MN55044
, Ny10022
, il60301
, nJ07680
...the equation would convert them to:
, MN 55044
, NY 10022
, IL 60301
, NJ 07680
Here is set of equations that will convert a text file that is
entirely in upper case into one that consists of upper and lower case,
in sentence style:
\u*00=\p0*01
\u*01=\y0
.=.*00
In this equation, when an upper case letter is encountered in the data
stream, and flag 0 is OFF, it is output as-is and flag 0 is set ON. If
an upper case letter is encountered and flag 0 is ON, it is converted
to a lower case letter. If a period is encountered, it is output as-is
and flag 0 is set OFF, so the next upper case letter will be output
as-is. Any other characters in the file will be passed through as-is.
From this example data stream:
NOW IS THE TIME. FOR ALL GOOD MEN. TO COME TO THE AID.
...the equation would produce:
Now is the time. For all good men. To come to the aid.
How would you alter the above small table to convert a file from all
upper case to initial upper case, where the first letter of each word
is capitalized? Stated another way, what character appears before each
word that could be used to control the flag? Here's the answer:
\u*00=\p0*01
\u*01=\y0
.=.*00
\20=\20*00
Just by adding the last line to the equation, a space will set flag 0
to OFF, forcing the next character to be output as upper case.
Now the previous sample data stream will be converted to:
Now Is The Time. For All Good Men. To Come To The Aid.
-19- SNR+ VER 5.03
WILD CARDS AND \s
You can use wild cards in conjunction with the \s start-of-file code
to remove a number of characters from the beginning of a file:
\s*(256)\p=
The above equation will remove the first 256 characters (of any type)
from a file.
BEWARE OF SIDE EFFECTS
Wild card codes are powerful tools. They can also be powerfully
troublesome if not used with care. Keep in mind that these wild card
codes have a hierarchy of pattern-matching: they range from highly
inclusive to highly restrictive. If you have two or more wild card
equations in your table, be keen to the possibility that one of the
equations might end up taking precedence over another, leading to
unintended conversions.
SNR bases its pattern-matching tests on the total length of search
equations. That is, for any string in the data stream that could
potentially match two or more table equations, the equation that is
longest will be tried first. The next longest one will be next, and so
on down the line until no further equations meet the pattern in the
data stream item.
So, for any two equations that begin with the same character, the
longer one will be tested before the shorter one. If the longer one
happens to match the data stream pattern, it is replaced immediately,
and none of the other equations will be tested.
If two wild card equations have the same search string length but
begin with a different wild card code, the equation with the wild card
code that is more restrictive will be tested first. That is, if an
equation beginning with \x and one beginning with \v are both the same
length, the one beginning with \x will be tested against the data
stream first, because the set of alphabetic characters is more
exclusive than the set of all printable characters.
Design your wild card search strings carefully to take this behavior
into consideration, and your conversions will be more successful.
-20- SNR+ VER 5.03
VARIABLE-LENGTH CONVERSIONS
Up to now, the conversions that we have been discussing have dealt
with fixed-length patterns of characters: some number of characters in
a search string gets converted to some other number of characters in a
replacement string, and we have been able to specifically count the
number of characters on either side of the equation.
Now we are going to explore the method SNR has of defining repeating
characters or wild card codes whose actual number is unknown. These
types of equations are called "variable-length" conversions.
Variable-length conversions are used when you want to search on some
identifiable set of characters -- maybe a line of text, maybe a
phrase, maybe a command parameter -- but you don't know HOW MANY
characters will be involved in the actual pattern.
For instance, a line of text might contain one word, or it might
contain fifty words. An imbedded text command might contain one
parameter, or it might contain twelve parameters. A comma-delimited
database field might contain ten characters, or it might contain
twenty-five characters.
The one thing that these types of text patterns -- a line, a
parameter, a field -- have in common is that we wish to treat each of
them as a unit, even though we don't know how many characters they
will contain.
Using variable-length conversions, you can define these higher-level
units of data and translate them, throw them away, or rearrange them
as needed.
SEARCH STRING FUNDAMENTALS
A variable-length string is declared to be a set of a specific type of
character (literal or wild card) that matches from zero to a stated
maximum length of similar characters in the data stream.
Since variable-length search strings are basically unspecified lengths
of similar characters, we can draw upon the notation that we
prescribed earlier for the repeating-character strings:
*(40)\20
The above portion of a search string should be familiar to you as the
definition for 40 spaces. To turn this into a variable-length
definition, we add the \^ variable-length code:
\^*(40)\20
This search string fragment now defines "up to" 40 spaces. Likewise,
the following fragment defining a string of 25 alphabetic characters:
*(25)\x
...can be redescribed in variable-length terms like this:
\^*(25)\x
-21- SNR+ VER 5.03
This search fragment now defines a string of "up to" 25 alphabetic
characters.
It is important to adhere to the following rules when you use
variable-length search strings:
- An equation cannot begin with a variable-length search string.
- Each variable-length search string must be followed by a fixed-
length terminator string.
These restrictions are not quite so arbitrary as they may seem. As for
the first rule, each equation must have some character or set of
characters of fixed length that can begin the search pattern. The
variable-length string can come right after this. As for the second
rule, there must be something to tell SNR when the end of a variable-
length string has been met; some specific character or set of
characters must be defined that can act as a terminator to the
variable-length string. The second rule also implies that the search
side of an equation cannot end with a variable-length string.
So here is a complete example of a valid variable-length search
string:
\v\^*(80)\v\0d\0a=
Look at the equation. It defines: a printable character followed by up
to 80 more printable characters terminated by a carriage return/line
feed, to be thrown away. The effect of this equation will be to throw
away full lines that are less than 82 characters long, or to throw
away the last 82 characters of a longer line.
Multiple variable-length strings can appear in an equation:
"\^*(30)\v","\^*(30)\v","\^*(30)\v"\0d\0a=
The above equation will search for: a quote followed by up to 30
printable characters followed by a quote-comma-quote terminator
followed by up to 30 more printable characters followed by another
quote-comma-quote terminator followed by up to 30 more printable
characters terminated by a quote and carriage return/line feed, and
throw it all away. You may recognize this particular pattern as being
characteristic of the comma-delimited ASCII file that many database
and mail-merge programs support.
The two examples we used above employed literal characters as
terminator strings. It is equally valid to use wild card codes as
terminator strings:
\n\^*(20)\n\t=
The above equation defines a number followed by up to 20 more numbers
followed by a punctuation character to be thrown away.
When you use wild card codes as a terminator to a variable-length
string which itself uses wild cards, remember the earlier admonition
concerning the hierarchy of wild card pattern matching. It applies
-22- SNR+ VER 5.03
equally here. Specifically, a wild card terminator must be more
restrictive in type than the variable-length wild card string it is
terminating.
For instance, the following variable-length string is invalid:
\n\^*(10)\n\p=
The \p code used as the terminator has a higher inclusion value than
the \n code. The result will be that this equation will generate
incorrect output, if indeed it outputs anything at all.
REPLACEMENT STRING FUNDAMENTALS
The prior discussion concerned itself exclusively with the procedures
for defining the search side of an equation. Now we will turn our
attention to defining the replacement side of a variable-length
equation.
Since we don't know the actual number of characters that a variable-
length string will pick up, it is not possible to use the \p notation
to identify precise character positions for output. Instead, we
identify groups of variable-length strings by the positions of the
group: the first variable-length string on the search side is number
one, the next such string in the equation is number two, and so on.
(This is in contrast to the \p code, which begins at zero.)
Up to 40 variable-length strings may be entered in a single equation.
The variable-length replacement strings are defined as \^1, \^2, \^3
and so on. Let's take an example based on the comma-delimited ASCII
file, used earlier:
"\^*(30)\v","\^*(30)\v","\^*(30)\v"\0d\0a=\^1\09\^2\09\^3\0d\0a
While the search side of the equation remains the same as before, we
have now introduced the replacement side. The replacement will be the
first variable-length string followed by a tab code (the hex 09)
followed by the second variable-length string followed by a tab code
followed by the third variable-length string followed by a carriage
return/line feed. The effect of this equation will be to convert a
comma-quote delimited file into a tab-delimited file. (Digression: for
purposes of illustration this example is fine, but is there a better
way to accomplish the same result? The answer appears in a later
section.)
You may have observed by insight that it is possible to rearrange the
strings by changing the order in which they appear on the replacement
side of the equation. The procedure is straightforward and obvious, so
I won't bother with an illustration.
How do you output a wild card character that has been defined as a
terminator? Since you can't pick the character out of the data stream
by exact position, SNR provides a notation that describes a terminator
string. Since each variable-length string in an equation is numbered
from 1 to 40, and by definition each variable-length string must be
followed by a terminator string, it makes sense that the terminator
strings are also numbered from 1 to 40.
-23- SNR+ VER 5.03
The replacement notation for terminator strings is \^:1, \^:2, \^:3
and so on (the colon being the only difference between them and the
variable-length string identifier).
Now having some way of differentiating between variable-length strings
and terminators, we can convert equations such as the following:
\u\^*(15)\y\u\^*(15)\y\t=\p0\^1\^:1\20\^2\^:2
This admittedly contrived example serves to illustrate the placement
and usage of the various codes. The search side defines an upper case
character followed by up to 15 lower case characters terminated by an
upper case character followed by up to 15 more lower case characters
terminated by a punctuation character. The replacement side outputs
the first character of the string (the first \u on the search side)
followed by the first variable-length string followed by the first
terminator followed by a space followed by the second variable-length
string followed by the second terminator.
For a concrete example, assume that this string occurs in the data
stream:
SuperDuper! UnquestionablyAwesome.
Our example equation will convert it to this:
Super Duper! Unquestionably Awesome.
Let's try switching the order of a few elements in the equation:
\u\^*(15)\y\u\^*(15)\y\t=\^:1\^2\20\p0\^1\^:2
The search side remains the same; the replacement side has changed.
Study it and see if you can follow the meaning. Perhaps this will
help: given the same example data stream as above, the altered
equation will produce this output:
Duper Super! Awesome Unquestionably.
PADDING VARIABLE-LENGTH STRINGS
A specialized use of variable-length replacement strings allows the
strings to be padded out to their maximum width. Normally, of course,
the strings are output to their "natural" width -- however many
characters are contained in the data stream are what gets output
during replacement.
Padding the strings adds blank spaces to the left or right of the text
proper, and has the effect of aligning the text within a field that is
as wide as the maximum width specified in the variable-length search
string.
To invoke variable-length padding, the alignment codes R and L are
inserted before the variable-length string number in a replacement
string. For instance, \^L1 will left-align the first variable-length
string; \^R3 would right-align the third variable-length string.
Here's how the codes are used in an equation:
-24- SNR+ VER 5.03
"\^*(30)\v","\^*(30)\v","\^*(30)\v"\0d\0a=\^L1\^L2\^R3\0d\0a
Taking our well-worn comma-delimited example yet again, pay particular
attention to the replacement side. Those codes specify that the first
variable-length string will be output left-aligned to a width of 30
characters, the second variable-length field will be output left-
aligned to a width of 30 characters, and the third variable-length
field will be output right-aligned to a width of 30 characters
followed by a carriage return/line feed. What we have done in this
example is converted a file containing comma-delimited fields into a
file containing fixed-length fields.
The widths of the fields can be changed simply by changing the width
values in the search strings:
"\^*(35)\v","\^*(25)\v","\^*(40)\v"\0d\0a=\^L1\^L2\^R3\0d\0a
In this case, the replacement side of the equation is the same as
before, but the search side now contains different widths for each of
the fields. This will affect the appearance of the output when the
text is padded; it will not affect the appearance of the text if the
alignment codes are removed.
PAD CHARACTER
By default, variable-length strings are padded with spaces when the R
or L alignment codes are specified. You can select a different padding
character by placing the setup code \\Axx at the beginning of your
table, where "xx" is a hex value representing the character you wish
to use as the padding character. Only one padding character may be
defined per table, and once selected, it can not be changed.
As an example, you could pad your variable-length text strings with
periods by using the setup code:
\\A2E
-25- SNR+ VER 5.03
HINTS AND TIPS
The conversion capabilities of SNR are very sophisticated. Some of
them are subtle, and require some study and experimentation to master.
Here is a list of a few pointers that might help you avoid the
pitfalls of inexperience.
1. Lots of smaller tables are better than one large table.
It is sometimes tempting to create a huge, do-it-all conversion
table that takes care of your conversion task all at once. Let me
tell you from experience that it is much better to break down
your conversion into a series of smaller tables that are run one
after the other.
One reason behind this is that it is much simpler to
pinpoint problems when you take your conversion a little bit at a
time, and you won't waste your energy poking around a large table
feebly trying to discover why your text isn't converting the way
you thought it should.
Remember that SNR allows you to run up to 20 tables at one
time on the command line anyway, giving you the same benefit as
having one large table.
Another reason to break your task into smaller pieces is
that some of the more advanced conversion models -- such as
variable-length strings -- may have unwanted side effects on
other equations or vice versa. It is safer to isolate the complex
portions of a conversion task into their own tables to insure
that harmful side effects are not introduced.
2. Use literal characters over wild cards wherever possible.
Wild cards are very powerful, but they require much more time to
process than literal characters. Whenever you can, it is
preferable to define conversion equations using literal
characters instead of wild cards.
If your conversion task can be represented by a few well-
defined data patterns, it's a good candidate for literal
equations. If, on the other hand, you have a particular need to
search a wide range of data patterns, then the wild card equation
is your best choice.
Remember the question I posed back in the variable-length
section? I asked if there was a better way than using variable-
length strings to convert a comma-delimited text file into a tab-
delimited text file. The answer is yes. If you think about it a
little bit, there is no real need to do any conversion of the
data within the delimiters at all. Only the delimiters themselves
need to be converted. So rather than chewing up a lot of CPU
cycles scanning the data stream for variable-length text, the
following two equations will perform the conversion much faster:
"=
","=\09
An added benefit to using this table over the other one is that
this version does not need to be modified when the number of
fields changes, whereas the other one does. I guess this is one
of those situations where Less Is More.
-26- SNR+ VER 5.03
3. Use context flags rather than variable-length patterns when you
are seeking to compress the data stream.
This is another issue related to efficiency. Variable-length
conversion strings provide a means of reading an unspecified
number of consecutive characters and eliminating them, in effect
compressing the data stream. But equations using context flags
such as *ic or *ig provide a much faster conversion than
variable-length equations.
4. Debug your tables an equation at a time.
Debugging is a cross between an art and a science; it requires
both discipline and inspiration to do it well. The first step in
effective debugging is to find the trouble spot and focus your
attention on it.
The easiest way to do this in SNR is to place a comment code
(backslash followed by a space) in front of every equation in
your table, and enable each equation one at a time. Test a sample
of your data file for each new equation that you enable.
Eventually, you will hit upon the one equation that is fouling
things up, and you will be able to alter it or isolate it by
moving it to another table, where it can be dissected and
analyzed apart from other equations.
Once the trouble equation has been found, look at it
carefully. Consider these questions:
- Are you sure it is defined according to the rules set
forth in this document?
- Does each element of the equation logically match a likely
pattern in the data stream?
- Are your wild card definitions too vague to match a
specific pattern in the data stream?
- Are there any possible conflicts between elements of the
equation that would render it invalid?
- Can the equation be restated equivalently in different
terms, that is, using a different conversion model (for
example, transforming wild cards into literals)?
- Is what you are trying to do reasonable or even possible
in the SNR conversion paradigm?
- Have you done something like this before and made it work?
- Is there a similar equation shown in the sample tables?
- Could there be a bug in SNR?
Seeking the answers to these questions will help guide your
troubleshooting steps in a logical, scientific manner.
5. Use lots of comments.
This point was brought up in the first section of the manual, but
it bears repeating. 'Nuff said.
6. Peruse the provided example tables.
The range of possible uses of SNR, and the examples to support
them, are beyond the scope of this manual. Many of the more
complex and subtle uses of conversion strings can only be
-27- SNR+ VER 5.03
appreciated by experimentation or by looking through the example
tables provided in the archive file.
The tables may help provide you with hints on syntax and
usage that this manual was unable to. If you register the
program, you will be entitled to telephone or on-line support
(toll at your expense) and I will be happy to help you solve your
conversion problems.
-28- SNR+ VER 5.03
MISCELLANEOUS
BIT STRIPPING
In prior versions of SNR, the default action was to map all incoming
characters to 7-bit ASCII before running them through the equations.
In version 5, this has been changed: no "bit-stripping" is performed
by default. If you wish to have your table map incoming characters to
a 7-bit ASCII set, you must enter the \\L7 setup code at the beginning
of your table.
NOTES
The program will run on any IBM-compatible computer using DOS 3.0 or
higher, with a minimum of 512K RAM.
SNR was written in Borland C++ 3.1.
DISCLAIMER
This program is distributed as shareware. Use it, copy it, upload it,
give it to your friends. Please distribute only the complete program
in archived form, including all document and sample files. No
warranties, either expressed or implied, are given by the author or
distributor of the program, and the user accepts all risk of damage
arising out of the application and use of the program.
REGISTRATION
If you like SNR, please register your copy for $50. At that price,
it's one of the bargains of the text processing world, and I'll bet
you recoup that investment dozens, maybe hundreds, of times over in
the work it will save you.
Registered users will receive a version of the program that does not
have the compliance delay screen. Also included FREE with the
registered version is TurboEdit, a DOS text editor that provides a
multiple-window, mouse-supported editing environment with a built-in
ASCII table and calculator (very useful for editing your conversion
tables).
Please fill out and return with your payment the form in the file
ORDER.FRM.
Registered users are guaranteed a response to technical support
questions by phone, mail, or CompuServe. Unregistered users will
receive responses on a time-available basis, but I can't guarantee
that I will respond to those requests.
-29- SNR+ VER 5.03